Molecular Phylogenetic
   HOME

TheInfoList



OR:

Molecular phylogenetics () is the branch of
phylogeny A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular
phylogenetic In biology, phylogenetics (; from Greek φυλή/ φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary history and relationships among or within groups o ...
analysis is expressed in a
phylogenetic tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in
taxonomy Taxonomy is the practice and science of categorization or classification. A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...
and
biogeography Biogeography is the study of the distribution of species and ecosystems in geographic space and through geological time. Organisms and biological communities often vary in a regular fashion along geographic gradients of latitude, elevation, ...
. Molecular phylogenetics and
molecular evolution Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics ...
correlate. Molecular evolution is the process of selective changes (mutations) at a molecular level (genes, proteins, etc.) throughout various branches in the tree of life (evolution). Molecular phylogenetics makes inferences of the evolutionary relationships that arise due to molecular evolution and results in the construction of a phylogenetic tree.


History

The theoretical frameworks for molecular
systematics Biological systematics is the study of the diversification of living forms, both past and present, and the relationships among living things through time. Relationships are visualized as evolutionary trees (synonyms: cladograms, phylogenetic tre ...
were laid in the 1960s in the works of
Emile Zuckerkandl Émile Zuckerkandl (July 4, 1922 – November 9, 2013) was an Austrian-born French biologist considered one of the founders of the field of molecular evolution. He introduced, with Linus Pauling, the concept of the "molecular clock", which enab ...
,
Emanuel Margoliash Emanuel Margoliash (February 10, 1920 – April 10, 2008) was a biochemist who spent much of his career studying the protein cytochrome c. He is best known for his work on molecular evolution; with Walter Fitch, he devised Fitch-Margoliash met ...
,
Linus Pauling Linus Carl Pauling (; February 28, 1901August 19, 1994) was an American chemist, biochemist, chemical engineer, peace activist, author, and educator. He published more than 1,200 papers and books, of which about 850 dealt with scientific top ...
, and
Walter M. Fitch Walter Monroe Fitch (May 21, 1929 – March 10, 2011) was a pioneering American researcher in molecular evolution. Education and career Fitch attended University of California, Berkeley, where he graduated with an A.B. in chemistry in 1953 and a ...
. Applications of molecular systematics were pioneered by
Charles G. Sibley Charles Gald Sibley (August 7, 1917 – April 12, 1998) was an American ornithologist and molecular biologist. He had an immense influence on the scientific classification of birds, and the work that Sibley initiated has substantially altered our u ...
(
bird Birds are a group of warm-blooded vertebrates constituting the class Aves (), characterised by feathers, toothless beaked jaws, the laying of hard-shelled eggs, a high metabolic rate, a four-chambered heart, and a strong yet lightweigh ...
s),
Herbert C. Dessauer Herbert Clay Dessauer (30 December 1921 – 8 February 2013) was an American biochemist, and a pioneer in the use of molecular systematics to clarify the evolutionary relationships of anole Dactyloidae are a family of lizards commonly kno ...
( herpetology), and Morris Goodman (
primate Primates are a diverse order of mammals. They are divided into the strepsirrhines, which include the lemurs, galagos, and lorisids, and the haplorhines, which include the tarsiers and the simians (monkeys and apes, the latter including huma ...
s), followed by Allan C. Wilson,
Robert K. Selander Robert Keith Selander (July 21, 1927 – June 14, 2015) was an American evolutionary biologist and emeritus professor at Pennsylvania State University. Known for his studies of molecular genetics Molecular genetics is a sub-field of biology t ...
, and John C. Avise (who studied various groups). Work with
protein electrophoresis Protein electrophoresis is a method for analysing the proteins in a fluid or an extract. The electrophoresis may be performed with a small volume of sample in a number of alternative ways with or without a supporting medium: SDS polyacrylamide gel ...
began around 1956. Although the results were not quantitative and did not initially improve on morphological classification, they provided tantalizing hints that long-held notions of the classifications of
bird Birds are a group of warm-blooded vertebrates constituting the class Aves (), characterised by feathers, toothless beaked jaws, the laying of hard-shelled eggs, a high metabolic rate, a four-chambered heart, and a strong yet lightweigh ...
s, for example, needed substantial revision. In the period of 1974–1986, DNA-DNA hybridization was the dominant technique used to measure genetic difference.


Theoretical background

Early attempts at molecular systematics were also termed as
chemotaxonomy Webster's Dictionary, Merriam-Webster defines ''chemotaxonomy'' as the method of biology, biological classification based on similarities and dissimilarity in the structure of certain chemical compound, compounds among the organisms being classifi ...
and made use of proteins,
enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products. A ...
s,
carbohydrate In organic chemistry, a carbohydrate () is a biomolecule consisting of carbon (C), hydrogen (H) and oxygen (O) atoms, usually with a hydrogen–oxygen atom ratio of 2:1 (as in water) and thus with the empirical formula (where ''m'' may or ma ...
s, and other molecules that were separated and characterized using techniques such as
chromatography In chemical analysis, chromatography is a laboratory technique for the separation of a mixture into its components. The mixture is dissolved in a fluid solvent (gas or liquid) called the ''mobile phase'', which carries it through a system ( ...
. These have been replaced in recent times largely by
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
, which produces the exact sequences of
nucleotides Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules w ...
or ''bases'' in either DNA or RNA segments extracted using different techniques. In general, these are considered superior for evolutionary studies, since the actions of evolution are ultimately reflected in the genetic sequences. At present, it is still a long and expensive process to sequence the entire DNA of an organism (its
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
). However, it is quite feasible to determine the sequence of a defined area of a particular
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins are ...
. Typical molecular systematic analyses require the sequencing of around 1000
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s. At any location within such a sequence, the bases found in a given position may vary between organisms. The particular sequence found in a given organism is referred to as its
haplotype A haplotype ( haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. Many organisms contain genetic material ( DNA) which is inherited from two parents. Normally these organisms have their DNA or ...
. In principle, since there are four base types, with 1000 base pairs, we could have 41000 distinct haplotypes. However, for organisms within a particular species or in a group of related species, it has been found empirically that only a minority of sites show any variation at all, and most of the variations that are found are correlated, so that the number of distinct haplotypes that are found is relatively small. In a molecular systematic analysis, the haplotypes are determined for a defined area of
genetic material Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main clas ...
; a substantial sample of individuals of the target
species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriate s ...
or other
taxon In biology, a taxon (back-formation from ''taxonomy''; plural taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular nam ...
is used; however, many current studies are based on single individuals. Haplotypes of individuals of closely related, yet different, taxa are also determined. Finally, haplotypes from a smaller number of individuals from a definitely different taxon are determined: these are referred to as an outgroup. The base sequences for the haplotypes are then compared. In the simplest case, the difference between two haplotypes is assessed by counting the number of locations where they have different bases: this is referred to as the number of ''substitutions'' (other kinds of differences between haplotypes can also occur, for example, the ''insertion'' of a section of
nucleic acid Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main cl ...
in one haplotype that is not present in another). The difference between organisms is usually re-expressed as a ''percentage divergence'', by dividing the number of substitutions by the number of base pairs analysed: the hope is that this measure will be independent of the location and length of the section of DNA that is sequenced. An older and superseded approach was to determine the divergences between the
genotype The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
s of individuals by DNA-DNA hybridization. The advantage claimed for using hybridization rather than gene sequencing was that it was based on the entire genotype, rather than on particular sections of DNA. Modern sequence comparison techniques overcome this objection by the use of multiple sequences. Once the divergences between all pairs of samples have been determined, the resulting
triangular matrix In mathematics, a triangular matrix is a special kind of square matrix. A square matrix is called if all the entries ''above'' the main diagonal are zero. Similarly, a square matrix is called if all the entries ''below'' the main diagonal are ...
of differences is submitted to some form of statistical
cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
, and the resulting
dendrogram A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts: * in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses. ...
is examined in order to see whether the samples cluster in the way that would be expected from current ideas about the taxonomy of the group. Any group of haplotypes that are all more similar to one another than any of them is to any other haplotype may be said to constitute a
clade A clade (), also known as a monophyletic group or natural group, is a group of organisms that are monophyletic – that is, composed of a common ancestor and all its lineal descendants – on a phylogenetic tree. Rather than the English term, ...
, which may be visually represented as the figure displayed on the right demonstrates. Statistical techniques such as
bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Etymology Tall boots may have a tab, loop or handle at the top known as a bootstrap, allowing one to use fingers ...
and
jackknifing Jackknifing is the folding of an articulated vehicle so that it resembles the acute angle of a folding pocket knife. If a vehicle towing a trailer skids, the trailer can push the towing vehicle from behind until it spins the vehicle around and ...
help in providing reliability estimates for the positions of haplotypes within the evolutionary trees.


Techniques and applications

Every living
organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and ...
contains deoxyribonucleic acid ( DNA), ribonucleic acid (
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
), and
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
s. In general, closely related organisms have a high degree of similarity in the
molecular structure Molecular geometry is the three-dimensional arrangement of the atoms that constitute a molecule. It includes the general shape of the molecule as well as bond lengths, bond angles, torsional angles and any other geometrical parameters that deter ...
of these substances, while the molecules of organisms distantly related often show a pattern of dissimilarity. Conserved sequences, such as mitochondrial DNA, are expected to accumulate mutations over time, and assuming a constant rate of mutation, provide a
molecular clock The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleoti ...
for dating divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows the probable
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
of various organisms. With the invention of
Sanger sequencing Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Fred ...
in 1977, it became possible to isolate and identify these molecular structures.
High-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
may also be used to obtain the
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
of an organism, allowing inference of phylogenetic relationships using transcriptomic data. The most common approach is the comparison of
homologous sequence Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a s ...
s for genes using
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...
techniques to identify similarity. Another application of molecular phylogeny is in
DNA barcoding DNA barcoding is a method of species identification using a short section of DNA from a specific gene or genes. The premise of DNA barcoding is that by comparison with a reference library of such DNA sections (also called " sequences"), an indi ...
, wherein the species of an individual organism is identified using small sections of
mitochondrial DNA Mitochondrial DNA (mtDNA or mDNA) is the DNA located in mitochondria, cellular organelles within eukaryotic cells that convert chemical energy from food into a form that cells can use, such as adenosine triphosphate (ATP). Mitochondrial D ...
or
chloroplast DNA Chloroplast DNA (cpDNA) is the DNA located in chloroplasts, which are photosynthetic organelles located within the cells of some eukaryotic organisms. Chloroplasts, like other types of plastid, contain a genome separate from that in the cell nu ...
. Another application of the techniques that make this possible can be seen in the very limited field of human genetics, such as the ever-more-popular use of
genetic testing Genetic testing, also known as DNA testing, is used to identify changes in DNA sequence or chromosome structure. Genetic testing can also include measuring the results of genetic changes, such as RNA analysis as an output of gene expression, or ...
to determine a child's
paternity Paternity may refer to: *Father, the male parent of a (human) child *Paternity (law), fatherhood as a matter of law * ''Paternity'' (film), a 1981 comedy film starring Burt Reynolds * "Paternity" (''House''), a 2004 episode of the television seri ...
, as well as the emergence of a new branch of criminal forensics focused on evidence known as
genetic fingerprinting DNA profiling (also called DNA fingerprinting) is the process of determining an individual's DNA characteristics. DNA analysis intended to identify a species, rather than an individual, is called DNA barcoding. DNA profiling is a forensic t ...
.


Molecular phylogenetic analysis

There are several methods available for performing a molecular phylogenetic analysis. One method, including a comprehensive step-by-step protocol on constructing a phylogenetic tree, including DNA/Amino Acid contiguous sequence assembly,
multiple sequence alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...
, model-test (testing best-fitting substitution models), and phylogeny reconstruction using Maximum Likelihood and Bayesian Inference, is available at Nature Protocol. Another molecular phylogenetic analysis technique has been described by Pevsner and shall be summarized in the sentences to follow (Pevsner, 2015). A phylogenetic analysis typically consists of five major steps. The first stage comprises sequence acquisition. The following step consists of performing a multiple sequence alignment, which is the fundamental basis of constructing a phylogenetic tree. The third stage includes different models of DNA and amino acid substitution. Several models of substitution exist. A few examples include
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...
, the Jukes and Cantor one-parameter model, and the Kimura two-parameter model (see
Models of DNA evolution A number of different Markov models of DNA sequence evolution have been proposed. These substitution models differ in terms of the parameters used to describe the rates at which one nucleotide replaces another during evolution. These models are ...
). The fourth stage consists of various methods of tree building, including distance-based and character-based methods. The normalized Hamming distance and the Jukes-Cantor correction formulas provide the degree of divergence and the probability that a nucleotide changes to another, respectively. Common tree-building methods include unweighted pair group method using arithmetic mean (
UPGMA UPGMA (unweighted pair group method with arithmetic mean) is a simple agglomerative (bottom-up) hierarchical clustering method. The method is generally attributed to Sokal and Michener. The UPGMA method is similar to its ''weighted'' variant, the ...
) and
Neighbor joining In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually based on DNA or protein sequence data, the algorithm require ...
, which are distance-based methods,
Maximum parsimony In phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes (or miminizes the cost of differentially weighted character-state changes) is preferred. ...
, which is a character-based method, and
Maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...
and
Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, a ...
, which are character-based/model-based methods. UPGMA is a simple method; however, it is less accurate than the neighbor-joining approach. Finally, the last step comprises evaluating the trees. This assessment of accuracy is composed of consistency, efficiency, and robustness. MEGA (molecular evolutionary genetics analysis) is an analysis software that is user-friendly and free to download and use. This software is capable of analyzing both distance-based and character-based tree methodologies. MEGA also contains several options one may choose to utilize, such as heuristic approaches and bootstrapping.
Bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Etymology Tall boots may have a tab, loop or handle at the top known as a bootstrap, allowing one to use fingers ...
is an approach that is commonly used to measure the robustness of topology in a phylogenetic tree, which demonstrates the percentage each clade is supported after numerous replicates. In general, a value greater than 70% is considered significant. The flow chart displayed on the right visually demonstrates the order of the five stages of Pevsner's molecular phylogenetic analysis technique that have been described.


Limitations

Molecular systematics is an essentially
cladistic Cladistics (; ) is an approach to biological classification in which organisms are categorized in groups (" clades") based on hypotheses of most recent common ancestry. The evidence for hypothesized relationships is typically shared derived char ...
approach: it assumes that classification must correspond to phylogenetic descent, and that all valid taxa must be
monophyletic In cladistics for a group of organisms, monophyly is the condition of being a clade—that is, a group of taxa composed only of a common ancestor (or more precisely an ancestral population) and all of its lineal descendants. Monophyletic gro ...
. This is a limitation when attempting to determine the optimal tree(s), which often involves bisecting and reconnecting portions of the phylogenetic tree(s). The recent discovery of extensive
horizontal gene transfer Horizontal gene transfer (HGT) or lateral gene transfer (LGT) is the movement of genetic material between Unicellular organism, unicellular and/or multicellular organisms other than by the ("vertical") transmission of DNA from parent to offsprin ...
among organisms provides a significant complication to molecular systematics, indicating that different genes within the same organism can have different phylogenies. In addition, molecular phylogenies are sensitive to the assumptions and models that go into making them. Firstly, sequences must be aligned; then, issues such as
long-branch attraction In phylogenetics, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. LBA arises when the amount of molecular or morphological change accumulated within a lin ...
,
saturation Saturation, saturated, unsaturation or unsaturated may refer to: Chemistry * Saturation, a property of organic compounds referring to carbon-carbon bonds **Saturated and unsaturated compounds ** Degree of unsaturation **Saturated fat or fatty aci ...
, and
taxon In biology, a taxon (back-formation from ''taxonomy''; plural taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular nam ...
sampling problems must be addressed. This means that strikingly different results can be obtained by applying different models to the same dataset. Moreover, as previously mentioned, UPGMA is a simple approach in which the tree is always rooted. The algorithm assumes a constant molecular clock for sequences in the tree. This is associated with being a limitation in that if unequal substitution rates exist, the result may be an incorrect tree.


See also

*
Computational phylogenetics Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic
*
Microbial phylogenetics Microbial phylogenetics is the study of the manner in which various groups of microorganisms are genetically related. This helps to trace their evolution. To study these relationships biologists rely on comparative genomics, as physiology and compa ...
*
Molecular clock The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleoti ...
*
Molecular evolution Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics ...
*
PhyloCode The ''International Code of Phylogenetic Nomenclature'', known as the ''PhyloCode'' for short, is a formal set of rules governing phylogenetic nomenclature. Its current version is specifically designed to regulate the naming of clades, leaving the ...
*
Phylogenetic nomenclature Phylogenetic nomenclature is a method of nomenclature for taxa in biology that uses phylogenetic definitions for taxon names as explained below. This contrasts with the traditional approach, in which taxon names are defined by a '' type'', which ...


Notes and references


Further reading

* *


External links


NCBI – Systematics and Molecular PhylogeneticsMEGA Software
*Molecular phylogenetics
/span> from ''Encyclopædia Britannica''. {{Bioinformatics Phylogenetics Molecular evolution